Description
In Chrome, there is an option to cancel loading of a page by clicking the X which is replaced by the refresh button when the page is loading.
There are some websites that keep on loading, even after 90s I keep on getting timeout errors.
If there was an option to stop loading the page (like there is in chrome), I would get the content that was already loaded and prevent from puppeteer to throw timeout.
I tried to used page.keyboard.press('Escape'); but with no luck..
Another solution would be to stop loading the page after X ms with something like that:
page.setPageLimitLoadingTime(30000);
which will stop the page from continuing the loading process and return all the data it already got...
Chromium API reference:
https://chromedevtools.github.io/devtools-protocol/tot/Page#method-stopLoading
Tell us about your environment:
- Puppeteer version: 1.8
- URLs (if applicable): https://bestmodelsbrasil.blogspot.co.il
website with more the 90s loading time - Node.js version: 8
Thank you.
** if there already is an option for my proposal I'm sorry, I just couldn't find anything...
Activity
aslushnikov commentedon Sep 13, 2018
@RebliNk17 there's a
window.stopmethod, would it be helpful to you?RebliNk17 commentedon Sep 13, 2018
Correct me if I'm wrong, but I think that the
evaluatefunction is only running afterpage.gotofinish...anyway. it's not working.
something like this is partially working:
await page._client.send("Page.stopLoading");but I cannot find a way to tell puppeteer that the page has finished loading...
RebliNk17 commentedon Sep 16, 2018
Don't know how and why this code now works:
await page._client.send("Page.stopLoading");It stops loading the page and returns all the data from
goto...A few days ago it didn't return any data and throw
timeoutbut now it does...RebliNk17 commentedon Sep 16, 2018
Sorry, not working as I thought.
When using this flag:
networkidle0ornetworkidle2inpage.gotoI am still getting timeout.When using 'domcontentloaded' or 'load' I'm not getting all the data from some websites but than Im not getting
timeouterrors.@aslushnikov Any thought on how to do it?
I've tried this:
https://github.com/RebliNk17/puppeteer/blob/master/lib/Page.js
But I'm still missing something...
aslushnikov commentedon Sep 16, 2018
@RebliNk17 what do you expect to see when you "stop" loading?
If you just want the navigation promise to not hang, I'd implement stopping somehow like this:
RebliNk17 commentedon Sep 16, 2018
@aslushnikov
I'll try to explain better my problem.
What I want is to receive the website HTML content and HTTP requests from the
page.gotopromise after X seconds passed without Exception even if the page did not finish loading.Currently, if the page did not finish loading a
Timeoutexception is thrown and no data (HTML / HTTP requests etc) is returned.Expected result:
When stopLoading is called, the
pagewill stop all process (Just like when pressing ESC or the X button on a regular browser) and will "display" all the content that has been loaded until that press.Is it clearer now?
If not, I will create a short video to explains it (English is not my native language)
aslushnikov commentedon Sep 16, 2018
@RebliNk17 I'm still not sure what's not working.
So the following approach should yield the expected result:
await page._client.send("Page.stopLoading");will stop page loading, as if you hit "X" in the browserawait page.content()Timeoutexception frompage.goto:So what's not working?
vsemozhetbyt commentedon Sep 16, 2018
Maybe
page.goto(url, { waitUntil: 'domcontentloaded' })will suffice?RebliNk17 commentedon Sep 17, 2018
That's not loading all the javascript in the page.
When using
networkidle0ornetworkidle2that's not enough, it will still hang and throwtimeoutexception.this will still hang until
timeout.what I found to be working is something like this:
I changed the code a little bit in
lib\Page.js#gotothis now waits for page loading to finish or loading to stop and not handing at all.
Is it possible to add it to the official
puppeteerAPI?RebliNk17 commentedon Oct 2, 2018
@aslushnikov Any thought on the code I shared above?
It work as expected but not sure if there is a better approach for that...
If it's good, should I create PR?
aslushnikov commentedon Oct 4, 2018
@RebliNk17 sorry for the delay, I was busy with other stuff.
Can we step back and re-iterate since I still don't understand what's not working.
If I understand correctly, there's a website that takes a lot of time to load. We want to constrain wait time to certain amount and get content from the page after this time. Is this correct?
If yes, why's the following not working for you?
RebliNk17 commentedon Oct 11, 2018
Sorry, I did not get any notification about your comment.
Your code will work, but sometimes, the timeout might not be a time, it can also depended on different code running in the background, like in my situation.
Adding this "stopPageLoading" which exists in the Chromium API, will make it possible...
It's something that Puppeteer should have...
RebliNk17 commentedon Oct 24, 2018
@aslushnikov Any thoughts?
I see two people voted this...
21 remaining items
nylen commentedon Jun 3, 2019
My specific use case: I was building a web archiving tool that (ideally) should work with arbitrary pages, and I found there are certain kinds of navigation timeouts that can be avoided or shortened, like when a page is stuck
Connecting...to a resource that's in the main rendering path. I think the issue in the OP is similar.I agree there are other things that could cause navigations after a page is "stopped". I am assuming that "aborting all current in-flight requests" is good enough for my use case, and so far it seems to be working. For this part
page._client.send('Page.stopLoading')is fine, but it was a bit hard to track down the correct call. At least now that is documented in this issue.So I am mostly just looking for potential ways to improve the code of puppeteer users here. Hence the suggestion to make
page.gotoaware of "navigation aborted" events, because I think this would allow getting rid of thePromise.racein the examples above.I don't think any of this is particularly urgent. Thanks for all of your work on Puppeteer.
superryeti commentedon Jul 30, 2019
@aslushnikov Thank you for this
I am using pyppeteer. and had the same problem(couldn't think of a way to get dom and cookies after a timeout). This solved my problem. I can access the DOM with.
and cookies by
I don't understand what everyone else is complaining about. Again, Thank you soo much. Saved me a couple of hours.
sheikalthaf commentedon Dec 1, 2019
I tried your solution and it is working good but when i try to take screenshot i'm getting error
Mister-Fil commentedon Jan 11, 2023
Stop page loading and/or something else, this can also close the
alert()If it doesn't work, then duplicate the line several times
otachkin commentedon Mar 31, 2023
Can some one help me to stop this page of continuously loading ?
https://mbd.baidu.com/newspage/data/landingpage?s_type=news&dsp=wise&context=%7B%22nid%22%3A%22news_9644758218931914527%22%7D&pageType=1&n_type=1&p_from=-1&rec_src=52
await page.evaluate(() => window.stop());Not working, puppeteer just stuck.
wesleyscholl commentedon Apr 24, 2023
This worked for me:
Thanks!
heaven commentedon Nov 18, 2023
@aslushnikov The problem is when setting a timeout with
page.goto, even when it fails with TimeoutError, the page keeps running in the browser. This slows down the entire process. Sometimesbrowser.pages()takes 10+ seconds. Working in a Lambda environment leads to unpredictable behavior and various errors.Whenever time is out and we reach the timeout, it would be great or even awesome to have a way to stop the page immediately. I agree stopping the ongoing requests won't help much most likely but that'd be better than nothing.
Here's an example:
You can see
await browser.pages()took 20 seconds. What's worst, with lambda the page can keep running in the browser even after the function is restarted. So the next event starts opening a new page and then thatcontext.newPage()takes an enormous amount of time. The timeout is set to 25 seconds but the job took almost 47.kduffie commentedon Dec 14, 2023
Our product crawls our customer's website as part of our overall solution. We are using Puppeteer for this and, overall, it works great. But we have the same problem discussed here. We can't know a priori what the appropriate timeout behavior needs to be on any given page or site.
When page.goto throws a TimeoutError, it doesn't necessarily mean that the page is unusable -- but after catching the error we can't access the HttpResponse that is returned when there is no exception. If a new method, page.response(), for example, returned the response object if it is available, we'd be happy. I realize that in some timeout scenarios the response will not be available (such as if the timeout is at the network layer). It may also be a good idea for Puppeteer to emulate a "stop" when it throws an error, but I don't see that I need to be part of that.
So something like the following would be desireable:
fix(goto): abort page.goto after timeout
fix(goto): abort page.goto after timeout
fix(goto): abort page.goto after timeout (#580)
Mahmoud-Skafi commentedon Jun 24, 2024
for some reason this works for me:
thanks for @chigix
sharee-tech commentedon Oct 1, 2024
I used .preventDefault() for this:
`const labelElement = await page.getByRole('link', { name: 'See The Lakes' });